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ABSTRACT 

The  current  DoD  target  acquisition  models  have  two  primary  deficiencies:  they  use  simplistic 
representations  of  the  vehicle  and  background  signatures,  and  a  highly  simplified  description  of  the  human 
observer.  The  current  signature  representation  often  fails  for  complex  signature  configurations  and  yields 
inaccurate  detectability  and  marginal  pay-off  predictions  for  low  signature  vehicles.  In  addition  it  is  not 
extensible  to  false  alarms  and  temporal  cues,  and  precludes  applications  to  vehicle  design  guidance  and 
diaqnosis.  The  current  human  observer  model  is  simplified  to  the  same  degree  as  the  signature 
representation,  and  as  such  does  not  extend  to  high  fidelity  target/background  signature  representations 

In  answer  to  these  deficiencies,  we  have  developed  the  TARDEC  Visual  Model  (TVM)  that  is  based 
upon  emerging  academic  computational  vision  models  (CVM).  Recent  advances  in  CVM  have  made  dramatic 
improvements  in  the  understanding  of  early  human  vision  processes.  A  model  of  neural  receptive  fields 
includes  a  generic  image  representation  of  the  spatial  processing  characteristics  for  early  vision  cortical  areas. 
An  input  image  is  first  divided  into  its  three  color  opponent  components  with  each  axis  further  decomposed 
into  a  set  of  band  pass  spatial  frequency  filters  (Gabor  or  wavelet  transform  filters)  with  different  center 
frequencies  and  orientations.  Signal  to  noise  statistics  are  then  calculated  on  each  channel,  appropriately 
aggregated  over  all  channels  using  signal  detection  theory  to  predict  probabilities  of  detection  and  false 

alarm. 


Present  Assignment:  Senior  Research  Physicist  -  GS15,  Survivability  Technology  Center,  US  Army 

Tank-Automotive  R&D  Center  (TARDEC),  Warren,  Ml 

Past  Experience:  22  years  as  a  TARDEC  research  scientist. 

Degrees  Held:  BS  Physics  -  Iowa  State  U.  &  Ph  D  -Physics  -  Wayne  State  U„  Detroit,  Ml 

Biography:  Mr.  Thomas  Meitzler  ,  , 

Present  Assignment:  Research  Physicist  -  GS13,  Survivability  Technology  Center,  US  Army  Tank- 

Automotive  R&D  Center  (TARDEC),  Warren,  Ml  _  /QA  QQ\  -anrl 

Past  Experience:  Adjunct  Professor  at  U.  of  Michigan  and  Henry  Ford  Community  College  (84-89)  and 
Research  Associate  in  the  Neurology  Dept  at  Henry  Ford  Hospital  (86-87) 

Degrees  Held:  BS  and  MS  in  Physics  at  Eastern  Michigan  U.  and  currently  a  Ph  D  candidate  in  bb  at 
Wayne  State  U. 


Biography:  Mr.  Gary  Witus 

Present  Assignment:  Senior  Scientist,  OptiMetrics  Inc.,  Ann  Arbor  Michigan  (92-94) 

Past  Experience:  Engineering  Chief  at  General  Dynamics  Land  Systems  Div.  in  Sterling  Heights,  Ml 
(88-92)  and  Program  Director  at  Vector  Research  Ann  Arbor,  Ml  (79-88).  „ 

Degrees  Held:  BS  in  Mathematics  and  MS  in  Industrial  and  Operations  Engineering  at  the  U.  of  Michigan, 

Ann  Arbor  Ml 


Report  Documentation  Page 

Form  Approved 

OMB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 

1 .  REPORT  DATE  2.  REPORT  TYPE 

20  JUN  1994  N/A 

3.  DATES  COVERED 

4.  TITLE  AND  SUBTITLE 

Computational  Vision  Modeling  for  Target  Detection 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

Dr.  Grant  Gerhart;  Mr.  Thomas  Meitzler;  Mr.  Gary  Witus 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

US  Army  RDECOM-TARDEC  6501  Ell  Mile  Rd  Warren,  MI 
48397-5000 

8.  PERFORMING  ORGANIZATION  REPORT 
NUMBER 

18749 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR’S  ACRONYM! S) 

TACOM/TARDEC 

11.  SPONSOR/MONITOR’S  REPORT 
NUMBER(S) 

18749 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release,  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

Presented  at  the  19th  Army  Science  Conference  20-24  June  1994,  Orlando,  FI  USA 

14.  ABSTRACT 

15.  SUBIECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF:  17.  LIMITATION 

18.  NUMBER  19a.  NAME  OF 

a.  REPORT  b.  ABSTRACT  c.  THIS  PAGE  S  AR 

unclassified  unclassified  unclassified 

8 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Gerhart,  Meitzler.Witus 


Computational  Vision  Modeling  for  Target  Detection 

Dr.  Grant  Gerhart 
Mr.  Thomas  Meitzler 
US  Army  TARDEC 
Warren,  Mi 

Mr.  Gary  Witus 
OptiMetrics  Inc. 

Ann  Arbor,  Ml 


1.  INTRODUCTION 

There  are  many  uses  for  accurate  and  realistic  models  describing  human  observer  decision.  _ 
performance  for  image  representations  of  target/background  scenes.  Military  visual  search  and  detection 
tasks  are  the  most  obvious  uses,  but  important  dual  use  applications  exist  including  highway  dnving  and 
models  have  important  utility  in  the  design  and  evaluation 
avoidance  technoloav  and  paint  patterns  for  aircraft  and  highway  barricades.  In  addition  other  applications 
such  as  machine  vision  and  photo  interpretation  also  require  high  fidelity  predictive  models  of  human  vision 
system  performance. 

Current  DoD  target  acquisition  models  use  ad  hoc  signature  metrics1 .2  to  quantify  and  describe 
specific  cue  features  inherent  in  complex  imagery.  These  metrics  typically  model  scene  content  using  1  st 
^d  2nd  order  statistical  parameters.  Figure  1  compares  two  vehicle  thermal  imag^  an^rresponding 
Fourier  transform  (FT)  images.  The  amplitude  and  phase  of  the  FT  images  have  switched  before  'nverse 
transforming  back  to  the  spatial  domain.  At  the  bottom  are  the  two  resulting  images  which  look  stm  ter  to  the 
oriainal  except  for  some  noise  degradation.  In  both  cases  the  phase  dominates  in  importance  over  the 
amplitude  information  as  evidenced  by  the  primary  cue  features  which 

phase  image.  Ad  hoc  signature  metrics  typical ly  use  parameters  such  as  the  mean  AT Sja 
of  the  target  and  local  background.  These  metrics  do  not  contain  any  phase  information  and  therefore 

cannot  be  accurately  used  to  predict  human  performance. 

Fiqure  2  compares  the  performance  of  the  mean  AT  and  root  mean  square  (RMS)  metrics  for  a 
standard  thermal  scen^of  a  M60  tank.  Each  column  contains  a  series  of  images  each  Mered  by ' a  one  octave 
band  pass  filter  with  a  center  frequency  FS  corresponding  to  feature  sizes  of  4,8,1 6, ,efc.  p«e  s.  The  right 
hand  column  contains  the  original  scene  with  no  target  present.  The  nerf  column  conta  ns  both  the  onginal 
tarqet  and  background  with  no  modifications.  The  two  columns  on  the  left  contain  modrf led  imagery  where 
the  target  image  is  modified  relative  to  the  background  by  performing  a  correction  jjj* » e,ther  ,he 
mean  or  RMS  AT  value  from  each  pixel.  The  modified  target  should  be  indistinguishable  from  the 
background  if  in  fact  either  metric  provides  a  good  measure  of  target/background  matching. 

A  good  match  occurs  for  feature  sizes  (i.e.  32  pixels)  approximately  the  size  of  the  tar9a1  dimensions 
The  tarqet  virtually  disappears  for  the  32  pixel  feature  size  fitter  while  becoming  quite  visible  for  the  other  high 
spatial  Frequency  bandpass  filters.  Particularly  note  the  high  contrast  horizontal  edges  which  become  quite 
visible  a/the  higher  spatial  frequencies.  It  is  evident  that  a  simple  metric  description  of  the  target/background 
scene  is  not  sufficient  to  eliminate  complex  cue  features. 
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Figure  1  Switching  the  amplitude  and  phase  of  the  Fourier  transform  of  two  images 
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Figure  2  A  comparison  of  the  mean  AT  and  RMS  signature  metrics  different  band  pass  filters 
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2.  TARDEC  VISUAL  MODEL  (TVM) 


This  section  will  describe  a  more  complex,  but  robust  model  of  the  human  vision  system.  The 
human  vision  system  begins  with  the  retina,  the  early  vision  channels  and  ends  with  higher  level  image 
understanding.  The  early  vision  system3-4'5  consists  of  a  set  of  parallel  channels  which  an®lVz®  f 

characteristics  such  as  motion,  color,  spatial  frequency  content,  onentation,  etc.  by  a  complex  ""^"9  ?f 
the  photoreceptor  cells  in  the  retina  through  the  lateral  geniculate  nucleus  onto  the  visual  cortex.  The  higher 
level  image  understanding  is  not  currently  well  understood  and  additional  research  needs  to  be  done  to 
establish  a  predictive  model.  The  research  community  has,  however,  done  a  good  deal  of  worK  m 
developing  computational  models  of  the  early  vision  system. 

The  TVM  model  assumes  that  defeating  the  early  vision  system  is  sufficient  to  defeat  the  human 
vision  for  tarqet  detection  applications  relating  to  low  signature  vehicles.  Figure  3  outlines  the  TAHUbO 
Computational  Vision  Model  (CVM)  starting  with  the  color  opponent  black-white,  red-green  and  yellow-blue 
decomposition.  The  next  step  consists  of  a  spatial  frequency  decomposition  of  each  color  opponent 
channel  into  one  octave  wide  band  pass  filters  with  different  center  frequencies  and  orientations.  The  i model 
defines  a  signal  to  noise  ratio  for  each  channel  and  aggregates  these  over  all  channels  to  define  d  human 
performance  parameter.  Signal  detection  theory  uses  the  d'  parameter  to  predict  single  glimpse  probabilities 
of  detection  (Pd)  and  false  alarm  (Pfa).  The  model  includes  both  fovea  vision  and  penpheral  off  axis 
eccentricity  effects  in  calculating  d'.  A  search  model  then  aggregates  the  single  glimpse  model  for  calculating 
time  dependent  Pd's  and  Pfa's. 

2.1  TVM  Signature  Vector 

The  human  visual  system5  samples  contrast  gradients  with  a  series  of  Gaussian  filters  approximating 
directional  spatial  derivatives.  Each  filter  performs  a  band  pass  operation  in  one  spatial  dimension  and  a  low 
pass  operation  in  the  orthogonal  direction.  The  filters  are  implemented  sequentially  with  a  simple  three  pixel 
kernel  in  each  of  two  directions.  The  set  of  band  pass  filters  are  implemented  at  six  distinct  center 
frequencies  differing  by  a  factor  of  two  in  spatial  frequency,  and  they  are  implemented  as  a  pyramidal 
hierarchy  of  filters  in  which  the  image  input  to  the  next  lower  spatial  frequency  is  obtained  as  a  residual  of  the 
filtering  operations  from  the  next  higher  spatial  frequency. 

A  typical  configuration  is  a  set  of  36  band  pass  filters  consisting  of  three  color  opponent  channels 
each  divided  into  two  orientations  with  six  center  frequencies.  Each  filter  output  is  an  image  which  contains 
some  particular  subset  of  the  original  image  content  and  cue  features.  A  measure  of  image  “2®™  J®  ® 
"contrast  modulation  energy"  which  is  the  target  outline  projected  onto  each  image  with  the  averageenergy 
computed  over  the  target  area  and  its  immediate  local  background.  Figure  4  illustrates  the  process  which  is 
remarkably  similar  to  conventional  amplitude  modulation  signal  detection.  To  extract  the  energy  envelop 
function"  from  the  contrast  modulation  function  one  first  squares  the  original  signal  followed  by  a  low  pass 
operation  for  each  band  pass  filter. 

The  difference  in  mean  energy  and  variance  between  the  target  and  background  are  two  measures 
of  the  contrast  modulation  function.  A  single  signature  metric  combines  these  two  measures  for  each  band 
pass  filtered  image  using  the  following  mathematical  operations: 


A  SIGNAL  = 


atgt 


®bkg\  + 
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Channel 

S/N 


[TGT_SIGNAL  -  BKG_ SIGNAL} 
^NOISEy(SUAL  *  UOISr^,^ 

.ClUTTFfl 


- *»-  Internal  Structure 


—  V-  Large  Area  Contrast 


AGGREGATE  -  f  E  s/nL' 


rr  pr  Dot  on  any  Channel 
3t  -  7 

-  I-  n  (l-Pr  Del.!) 

Chanoal 

Receiver  Operating  Characteristic 
(Single  Glimpse) 

1/  ■  ./•'  / 
t'rjM  /  /  ’  /  Cl 


■— **-  Feature  Patterns 
and  Figural  Unity 


(/ 

/ _ _ _ 


Figure  3  Flow  chart  for  the  TVM  model 


Figure  5  Dual  use  applications  of  the  TVM  model 


Gerhart,  Meitzler.Witus 


Change  in  Energy  Variance  |  + 1  Change  in  Energyj 


(2) 


(3) 


Noise  =  sj ICN  noise  power  +  BKG  noise  power 

Equations  1  -3  define  the  necessary  parameters  for  a  single  channel  signal  to  noise  ratio  (S/hJ).  The  norse 
term  includes  internal  eye  noise  which  is  a  function  of  illumination  level  and  a  clutter  noise  term  which  is 
estimated  from  the  background  statistics  of  the  contrast  modulation  energy. 

The  siqnature  vector  consists  of  36  S/N  ratios  -  one  for  each  channel  output.  A  single  "detectability" 
metric  weights  Lch  signature  vector  component  by  the  number  of  receptive 

oass  filter  Next  the  model  "pools"  these  elements  using  a  cortical  model  by  Watson  which  includes  the 
££*5 receptive  S  JSSto  to  any  spatial  fmq^ncy  c  a  f 

distance  from  the  fovea.  Equations  4-6  contain  a  mathematical  desorption  o^heproc^wtoethec.a.f 
indices  refer  to  a  specific  signature  vector  component  with  a  particular  color  opponent  component, 
orientation,  and  center  frequency.  The  quantity  d  is  the  "detectability"  metncwhich  hasa^^ll^r 
relationship  to  d.  The  latter  predicts  a  particular  ROC  curve  (le.  Fig.  3).  THB®«ponoritQNSR  is 
approximately  two  corresponding  to  an  ideal  observer  model  for  Signal  Detection  Theory. 


ENVELOP  FUNCTION 


Figure  4.  The  process  for  extracting  the  energy  envelop  function  from  a  band  pass  filter  contract  modulation 
function. 
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3.  TVM  CALIBRATION  TO  OBSERVER  PERFORMANCE 

TVM  uses  a  distribution  of  observers  because  individuals  have  different  thresholds  corresponding 
to  varying  probabilities  of  detection  and  false  alarm.  The  parameters  4*1  and  ¥2  are  empmcal  and  are 
calibrated  usinq  observer  field  test  data.  The  pilot  study  used  1 0  military  observers  viewing  six  sets  of 

six  separate  target  scenes.  The  imagery  consisted  of  low  contrast  targettorkground 
scenes  near  detectability  thresholds  which  were  presented  20  times  to  each  observer  in  random  order. 

The  hypothesis  that  the  ¥1  and  ¥2  parameters  are  approximately  constant  for  the  observer 

population  was  well  supported  by  the  data  where  ^1  =  3.0  ±.0.75  and  ^  inarv 

^7  =  0  1 2+  0  016  Though  a  good  deal  of  additional  calibration  testing  needs  to  be  done,  the  preliminary 
results  are  quite  encouraging  because  the  values  for  'Pi  and  V2  were  approximately  constant  for  several 
observers  over  several  thousand  observations. 

4.  SEARCH 

Since  the  pilot  study  used  obseiver  tests  where  the  subject  viewing  times  were  relatively  short,  the 
model  produces  single  glimpse  probabilities  of  detection  and  false  alarm.  Time  dependent  detection  an 
false  alarm  probabilities  are  obtained  by  aggregating  multiple  glimpses  through  some 

strateqy  The  search  process  can  be  modeled  in  several  ways  depending  upon  one  s  particular  knowledge 
of  the  search  process.  If  eye  tracking  data  is  available,  for  example,  then  a  search 1  strategyca nbe d@ vised 
which  is  a  function  of  the  probability  of  eye  fixation  and  its  position 1  in  the  scene,  ^  w^t  models^ 
however  this  type  of  information  is  usually  not  appropnate  and  other  types  of  search  models  must  be  used  ^ 
The  TVM  model  uses  a  Markov  process  which  consists  of  three  transient  states:  (1)  the  observer  is  cued  to 
target,  (2)  the  observer  is  cued  to  a  background  object,  and  (3)  the  observer  is  not  cued  to  any  specific  seen 
object’  The  result  is  a  set  of  transition  rates  for  detection,  false  alarm  and  quitting,  involving  single  glimpse 
probabilities,  eye  integration  time,  the  probability  of  the  target  being  in  the  field  of  view  and  various  other 
integral  expressions  over  the  search  field  of  regard. 

5.  TVM  APPLICATIONS 

The  oriqinal  model  development  objective  was  to  develop  better  ground  vehicle  requirements  and 
specifications  for  low  signature  vehicles  and  camouflage,  concealment,  and  deception  (CCD)  applications^ 
Previous  acquisition  models  for  man  in  the  loop  imaging  systems  used  a  senes  of  ad  hoc  signature  metrics 
which  averaged  over  an  ensemble  of  targets  and  consequently  are  not  sensitive  to  specific  target  cue 
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features  and  characteristics.  Figure  5  illustrates  the  successful  application  to  the  Requirements  Translation 
problem  The  top  portion  of  shows  three  images  of  a  high  contrast  vehicle  in  a  color scer?  o/m  rat!o«s  udna  the 
Fn  the  upper  righFstow  different  levels  of  signature  suppressions  a 

computational  vision  model  outlined  in  the  previous  text.  Note  that  when  the  SfN=°,  very  littie ^residual 
denature  is  left  indicating  that  the  multi-channel  band  pass  filter  model  captures  nearly  all  of  the  relati 
^raetlbackqround  deference  information.  Similar  oo^arisona  using  conventional  ad  hoc  suture  malncs 
srrt  as  rat”  for  visible  imagety  or  AT  for  thermal  imagery  would  not  work  neatly  as  well. 

The  bottom  portion  of  Fig.  5  illustrates  a  dual  use  application  of  TVM  to  passive  collision  avoidance. 
This  work  is  part  of  a  joint  TARDEC/GM  CRDA  to  develop  various  metrics  for  evaluating  the  consprcuity  of 

aulomobll^^coflisicm^voidance  applications.  Note  the  dWerencebetweenmeonfoffbrakekght 

mnfinurations  for  the  different  color  opponent  channels  with  a  specific  high  spatial  frequency  band  pass 
sensitive  to  horizontal  edges  and  the  subsequent  large  ib£'°n^  shows 

the^hree^rxjorchann^s  fit^bratelighUMoff  an^the^dfirc^ 

^mgs  tolls  Ame^ Consumer.  The  joint  effort  with  GM  begins  in  FY94  and  wMevelopmjd  vaMate 

proving  ground  and  test  facilities  for  final  model  validation  and  accreditation.  Additional  dual  use  application 
include  photo  interpretation,  machine  vision  and  automatic  object  recognition. 

SUMMARY 

The  first  phase  of  the  TVM  model  development  program  will  be  completed  at  the  end  olFYSM 
including  final  dLmentation  and  validation  for  both  pop-out  and  low 

both  stationary  and  dynamic  targets  in  complex  background  scenes,  analyze  coior  scenes,  provide  a 
visualization  module  (i  e  Fig  5).  and  have  an  acquisition  module  for  insertion  into  the  combat  models 
follow  on  phase  beginning  in  FY95  will  incorporate  higher  levels  of  target/background  discrimination 

applications  and  object  recognition. 
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